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DETAILED ACTION 



Response to Amendment 



1. 



Claims 1-19 are pending. 



2. 



Claims 1-2, 4-5, 8-13, 16-18 have been amended. 



3. 



Claim 19 has been added. 



Continued Examination Under 37 CFR 1.114 



4. A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1 .17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 
3/4/2009 has been entered. 

Response to Arguments 

5. In response to applicant's argument that the references fail to show certain 
features of applicant's invention, it is noted that the features upon which applicant relies 
(i.e., Chou does not provide an suggestion or disclosure of a detect key-phrase or 
subword sequence being a "spoken even of interest to be located in unknown speech." 
Remarks, Pages 7-8, H 4 and H 1) are not recited in the rejected claim(s). Although the 
claims are interpreted in light of the specification, limitations from the specification are 
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not read into the claims. See In re Van Geuns, 988 F.2d 1 181 , 26 USPQ2d 1057 (Fed. 
Cir. 1993). 

6. In response to applicant's argument that the references fail to show certain 
features of applicant's invention, it is noted that the features upon which applicant relies 
(i.e., Foote does not specify the form of the "desired query words." Foote provides no 
hint or disclosure that the "desired query words" to be located in the phone lattice are 
provided in a spoken form. (Remarks, Page 8, 1J 2)) are not recited in the rejected 
claim(s). Although the claims are interpreted in light of the specification, limitations from 
the specification are not read into the claims. See In re Van Geuns, 988 F.2d 1 181 , 26 
USPQ2d 1057 (Fed. Cir. 1993). 

7. In response to applicant's argument that "one of ordinary skill in the art would not 
have been motivated to combine the teachings of Chou, which are directed to speech 
utterance understanding with the teachings of Foote which are directed to audio 
information retrieval" (Remarks, Page 9, If 1), the fact that applicant has recognized 
another advantage which would flow naturally from following the suggestion of the prior 
art cannot be the basis for patentability when the differences would otherwise be 
obvious. See Ex parte Obiaya, 227 USPQ 58, 60 (Bd. Pat. App. & Inter. 1985). 

8. In response to applicant's argument that the references fail to show certain 
features of applicant's invention, it is noted that the features upon which applicant relies 
(i.e., Further, even if the Chou system is modified to include the audio information 
retrieval... the resultant system still does not perform the recited functions of ... 
(Remarks, Page 9, If 1)) are not recited in the rejected claim(s). Although the claims are 



Application/Control Number: 10/565,570 Page 4 

Art Unit: 2626 

interpreted in light of the specification, limitations from the specification are not read into 
the claims. See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). 

Claim Rejections - 35 USC §112 

The following is a quotation of the first paragraph of 35 U.S. C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

9. Claims 1 , 2, 4, 8, 10, 1 1 , 17, 18, and 19 are rejected under 35 U.S.C. 112, first 
paragraph, as failing to comply with the written description requirement. The claim(s) 
contains subject matter which was not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventor(s), at the time the 
application was filed, had possession of the claimed invention. The term "specification" 
is not defined in the specification as for an individual to understand the metes and 
bounds of the claim language. 

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 



1 0. Claims 1 -3, 5-1 6, and 1 8-1 9 are rejected under 35 U.S.C. 1 01 because the 
claimed invention is directed to non-statutory subject matter. 
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11. As per claims 1 -3, 5-1 6 under the most recent interpretation of the Interim 
Guidelines regarding 35 U.S.C.101 , a method claim must (1) be tied to another statutory 
class or (2) transform underlying subject matter to a different state or thing. If no 
transformation occurs, the claim(s) should positively recite the other statutory class to 
which it is tied to qualify as a statutory process under 35 U.S.C. 101 . As for guidance to 
areas of statutory subject matter, see 35 U.S.C. 101 Interim Guidelines (with emphasis 
of the Clarification of "processes" under 35 USC 101); As an example, the claim(s) 
could identify the apparatus that accomplishes the method steps, or positively recite the 
subject matter that is being transformed. As per the independent claim 1, the method 
may be interpreted as a human performing the methods of mentally defining an spoken 
event using subword units and recognizing the location of the events in an audio signal. 
Dependent claims 2-3, 5-16 fail to tie the method to a statutory apparatus. 

12. Claims 1 8-1 9 are also non-statutory under the most recent interpretation of the 
Interim Guidelines regarding 35 U.S.C.101 because although this claim comprises 
"system" type elements, these elements are disclosed in the specification (Page 11, If 
059) as a software embodiment, and when treated as a whole, the claims are more 
toward a non-statutory embodiment and not necessarily a hardware embodiment. 

Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
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invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

7. Claims 1-19 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Chou et al. (US Patent #5797123 hereinafter Chou) in view of Foote et al. (NPL 
document "Unconstrained keyword spotting using phone lattices with application to 
spoken document retrieval"). 

As per claim 1 , Chou teaches: 

forming a specification of a spoken event of interest to be located in unknown 
speech according to a plurality of sequences of subword units representing the spoken 
event of interest, wherein the forming includes identifying one or more instances of the 
spoken event of interest in a first set of audio signals and representing each identified 
instance of the spoken event of interest in the specification using at least one of the 
plurality of sequences of subword units; (Chou, column 4, lines 30-42 and 

column 6, lines 35-57, Fig. 2, the recognition is based on subword modeling which are 
compiled into networks (specification).) 

Chou fails to teach, but Foote teaches: 

accepting data representing the unknown speech in a second audio signal; 
(Foote, Page 218, U 2, ...Most of the time-consuming speech recognition must be done 
off-line, as messages are added to the archive... The data is input to a speech 
recognizer, which converts from audio to text, prior to archiving.) 
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locating putative instances of the spoken event of interest in the second audio 
signal using the specification of the spoken event of interest, wherein the locating 
includes identifying time locations of the second audio signal at which the spoken event 
of interest is likely to have occurred based on a comparison of the data representing the 
unknown speech with the specification of the spoken event of interest, query in the 
second speech data using the determined representation of the query. 
(Foote, Page 208, Fig. 2 and 4, ...These multiple hypotheses can be stored as a 
phone lattice which is a directed acyclic graph whose edges represent hypothesized 
phone occurrences and whose nodes represent the corresponding start and end 
times... Section 3.5 on Pages 214 and 215 show the keyword spotting using phone 
lattices.) 

It would have been obvious to someone of ordinary skill in the art at the time of 
the invention to combine Foote with the Chou device to provide Foote with a multi- 
modal input modality for searching. It would have been obvious to do so because 
Petkovic et al. (US Patent #6185527) similarly provides a multimodal keyword spotting 
algorithm for information retrieval therefore it would have been well known in the art to 
do so. (column 6, lines 40-47) 

As per claim 2, claim 1 is incorporated and Chou further teaches the method 
comprising: 

wherein forming the specification of the spoken event of interest comprises 
applying a computer-implemented speech recognition algorithm to data representing the 
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first set of audio signals. (Chou, column 4, lines 30-42, . . .subword-based 

speech recognition ...) 



As per claim 3, claim 1 is incorporated and Chou further teaches the method 
comprising: 

wherein the subword units include linguistic units. (Chou, column 4, lines 23- 
33, ...syllables, demisyllables, or phonemes...) 



As per claim 4, claim 2 is incorporated and Chou fails to specifically teach, but Foote 
teaches: 

wherein locating the putative instances includes applying a computer- 
implemented word spotting algorithm configured using the specification of the spoken 
event of interest. (Foote, section 2 uses phone lattices (specification) for word 

spotting.) 

It would have been obvious to someone of ordinary skill in the art at the time of 
the invention to combine Foote with the Chou device to provide Foote with a multi- 
modal input modality for searching. It would have been obvious to do so because 
Petkovic et al. (US Patent #6185527) similarly provides a multimodal keyword spotting 
algorithm for information retrieval therefore it would have been well known in the art to 
do so. (column 6, lines 40-47) 
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As per claim 5, claim 4 is incorporated and Chou further teaches the method 
comprising: 

selecting processing parameter values of the speech recognition algorithm for 
application to the data representing the first set of audio signals according to 
characteristics of the word spotting algorithm. (Chou, column 5, lines 27-49, 

...the key-phrases may be defined so as to directly correspond with semantic slots in a 
semantic frame, such as, for example, a time and a place.... the top-down key-phrases 
recognized by the instant illustrative embodiment may easily be directly mapped into 
semantic representations..., the key-phrase detector tags detected phrases with 
conceptual information for further consideration by the speech recognition algorithm.) 

As per claim 6, claim 5 is incorporated and Chou further teaches the method 
comprising: 

wherein the selecting of the processing parameter values of the speech 
recognition algorithm includes optimizing said parameters according to an accuracy of 
the word spotting algorithm. (Chou, column 5, lines 60-67, ...conventional 

minimum classification error (MCE) criterion, familiar to those skilled in the art...) 

As per claim 7, claim 5 is incorporated and Chou further teaches the method 
comprising: 

wherein the selecting of the processing parameter values of the speech 
recognition algorithm includes selecting values for parameters including one or more of 
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an insertion factor, a recognition search beam width, a recognition grammar factor, and 
a number of recognition hypotheses. (Chou, column 6, lines 35-57, 

. . .grammars may be manually derived directly from the task specification, or, 
alternatively, they may be generated automatically or semi-automatically (i.e., with 
human assistance) from a small corpus, using conventional training procedures familiar 
to those skilled in the art...) 

As per claim 8, claim 1 is incorporated and Chou further teaches the method 
comprising: 

wherein the specification of the spoken event of interest defines a network of 
subword units. (Chou, column 6, lines 57-60, ...the key-phrase and filler- 

phrase grammars are compiled into networks..., column 5, lines 50-67, ...key-phrase 
detector 1 1 comprises a subword-based speech recognizer adapted to recognize a set 
of key-phrases using a set of phrase sub-grammars...) 

As per claim 9, claim 8 is incorporated and Chou fails to specifically teach, but Foote 
teaches: 

wherein the network of subword units is formed by multiple sequences of 
subword units that correspond to different paths through the network. 
(Foote, Fig. 1, section 2, there are multiple phone paths through the lattice.) 

It would have been obvious to someone of ordinary skill in the art at the time of 
the invention to combine Foote with the Chou device to provide Foote with a multi- 
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modal input modality for searching. It would have been obvious to do so because 
Petkovic et al. (US Patent #6185527) similarly provides a multimodal keyword spotting 
algorithm for information retrieval therefore it would have been well known in the art to 
do so. (column 6, lines 40-47) 

As per claim 10, claim 1 is incorporated and Chou further teaches the method 
comprising: 

wherein forming the specification of the spoken event of interest includes 
determining an n-best list of recognition results. (Chou, column 7, lines 47-57, 
. . . N-best key-phrase candidates in the order of their scores. . . ) 

As per claim 1 1 , claim 10 is incorporated and Chou fails to specifically teach, but Foote 
teaches: 

wherein each sequence of subword units in the specification corresponds to a 
different one in the n-best list of recognition results. (Foote, Fig. 1 , Pages 208 

and 209, Sections 2 and 2.1 teach that the expectations of the phone sequences are 
maximized by having multiple hypothesis (paths). Those hypotheses are generated and 
ranked in an N-best list.) 

It would have been obvious to someone of ordinary skill in the art at the time of 
the invention to combine Foote with the Chou device to provide Foote with a multi- 
modal input modality for searching. It would have been obvious to do so because 
Petkovic et al. (US Patent #6185527) similarly provides a multimodal keyword spotting 
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algorithm for information retrieval therefore it would have been well known in the art to 
do so. (column 6, lines 40-47) 

As per claim 12, claim 1 is incorporated and Chou further teaches the method 
comprising: 

accepting first audio data representing utterances of the event of interest spoken 
by a user, and processing the first audio data to form a processed query. 
(Chou, column 3, lines 49-52, ...These key-phrases are then verified by assigning 
confidence measures thereto and comparing the confidence measures to a threshold, 
resulting in a set of verified key-phrase candidates. ..) 

As per claim 13, claim 1 is incorporated and Chou teaches: 

accepting a selection by a user of portions of stored data from the first set of 
audio signals, and processing the portions of the stored data to form a processed query. 
(Chou, column 6, lines 35-65) 

As per claim 14, claim 13 is incorporated and Chou teaches: 

prior to accepting the selection by the user, processing the first set of audio 
signals according to a first computer-implemented speech recognition algorithm to 
produce the stored data. (Chou, column 6, lines 35-65) 
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As per claim 15, claim 14 is incorporated and Chou further teaches the method 
comprising: 

the first speech recognition algorithm produces data related to presence of the 
subword units at different times in first set of audio signals. (Chou, column 5, 

lines 60-65, . . . The subword model recognizer employed by key-phrase detector 1 1 
uses lexicon 23 and subword models 22..., For a full signal to be analyzed, there must 
be subword units for each definable subword unit meaning in the phrase, and the 
phrase would extend over a period of time, thus the subword units would as well. 
Furthermore, the speech Recognition algorithm would produce data from the subword 
units inherently, so it would also produce data related to presence of the subword units 
at different times in the audio signal.) 

As per claim 16, claim 14 is incorporated and Chou teaches: 

applying a second speech recognition algorithm to the processed query. 
(Chou, Fig .2, column 7, lines 15-39) 

Claims 17 and 18 are the software and hardware representations of the method 
as claimed in claim 1. Claims 17 and 18 are rejected under the same principles as claim 
1 for having identical limitations. Chou, column 12, lines 6-45, ...Illustrative 
embodiments of the present invention may comprise digital signal processor (DSP) 
hardware, read-only memory (ROM) for storing software performing the operations 
discussed above, and random access memory (RAM) for storing results. Very large 
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scale integration (VLSI) hardware embodiments, as well as custom VLSI circuitry in 
combination with a general purpose processor or DSP circuit, may also be provided. . . 
Chou provides software and hardware illustrative embodiments which teach both claims 
17 and 18. 

As per claim 19, claim 18 is incorporated Chou fails to specifically teach, but Foote 
teaches: 

wherein the word spotter is further configured to identify time locations of the 
second audio signal at which the spoken event of interest is likely to have occurred 
based on a comparison of the data representing the unknown speech with the 
specification of the spoken event of interest. (Foote, Page 208, Fig. 2 and U 4, 

...These multiple hypotheses can be stored as a phone lattice which is a directed acyclic 
graph whose edges represent hypothesized phone occurrences and whose nodes 
represent the corresponding start and end times. . . Section 3.5 on Pages 214 and 215 
show the keyword spotting using phone lattices.) 

Conclusion 

13. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Refer to PTO-892, Notice of References Cited, for a listing of 
analogous art. 

14. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to GREG A. BORSETTI whose telephone number is 
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(571)270-3885. The examiner can normally be reached on Monday - Thursday (8am - 
5pm Eastern Time). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, RICHEMOND DORVIL can be reached on 571-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Greg A. Borsetti/ 
Examiner, Art Unit 2626 

/Talivaldis Ivars Smits/ 
Primary Examiner, Art Unit 2626 

4/7/2009 



